5 research outputs found
Categorising the World into Local Climate Zones -- Towards Quantifying Labelling Uncertainty for Machine Learning Models
Image classification is often prone to labelling uncertainty. To generate
suitable training data, images are labelled according to evaluations of human
experts. This can result in ambiguities, which will affect subsequent models.
In this work, we aim to model the labelling uncertainty in the context of
remote sensing and the classification of satellite images. We construct a
multinomial mixture model given the evaluations of multiple experts. This is
based on the assumption that there is no ambiguity of the image class, but
apparently in the experts' opinion about it. The model parameters can be
estimated by a stochastic Expectation Maximization algorithm. Analysing the
estimates gives insights into sources of label uncertainty. Here, we focus on
the general class ambiguity, the heterogeneity of experts, and the origin city
of the images. The results are relevant for all machine learning applications
where image classification is pursued and labelling is subject to humans
Towards Label Embedding -- Measuring classification difficulty
Uncertainty quantification in machine learning is a timely and vast field of
research. In supervised learning, uncertainty can already occur in the very
first stage of the training process, the labelling step. In particular, this is
the case when not every instance can be unambiguously classified. The problem
occurs for classifying instances, where classes may overlap or instances can
not be clearly categorised. In other words, there is inevitable ambiguity in
the annotation step and not necessarily a 'ground truth'. We look exemplary at
the classification of satellite images. Each image is annotated independently
by multiple labellers and classified into local climate zones (LCZs). For each
instance we have multiple votes, leading to a distribution of labels rather
than a single value. The main idea of this work is that we do not assume a
ground truth label but embed the votes into a K-dimensional space, with K as
the number of possible categories. The embedding is derived from the voting
distribution in a Bayesian setup, modelled via a Dirichlet-Multinomial model.
We estimate the model and posteriors using a stochastic Expectation
Maximisation algorithm with Markov Chain Monte Carlo steps. While we focus on
the particular example of LCZ classification, the methods developed in this
paper readily extend to other situations where multiple annotators
independently label texts or images. We also apply our approach to two other
benchmark datasets for image classification to demonstrate this. Besides the
embeddings themselves, we can investigate the resulting correlation matrices,
which can be seen as generalised confusion matrices and reflect the semantic
similarities of the original classes very well for all three exemplary
datasets. The insights gained are valuable and can serve as general label
embedding if a single ground truth per observation cannot be guaranteed
Advances in Uncertainty-Guided Local Climate Zone Classification
Like many other research fields, remote sensing has been greatly impacted by machine and deep learning and benefits from technological and computational advances. In recent years, considerable effort has been spent on deriving not just accurate, but also reliable models which yield a sense of predictive uncertainty. In the particular framework of image classification, the reliability is e.g. validated by cross-checking the model’s confidence in its predictions
against the resulting accuracy. Predictive uncertainties, on the other hand, can be for example used to determine expressive data samples. We investigate model reliability in the framework of Local Climate Zone (LCZ) classification, using the So2Sat LCZ42 [1] data set comprised of Sentinel-1 and Sentinel-2 image pairs.
[1] X. X. Zhu, J. Hu, C. Qiu, Y. Shi, J. Kang, L. Mou,
H. Bagheri, M. Haberle, Y. Hua, R. Huang et al., “So2sat
lcz42: a benchmark data set for the classification of global
local climate zones [software and data sets],” IEEE Geoscience and Remote Sensing Magazine, vol. 8, no. 3, pp.
76–89, 2020